Solving mushroom classification problem from https://github.com/pbiecek/InterpretableMachineLearning2020/issues/5

I will use logistic regression as model.

Load the data

Target class is in column "class", "p" means poissonous, "e" means eatable. Let's preprocess the data ie. encode classes.

Train and evaluate model

We have got perfect accuracy. Now let's check lime explanations.

As we see, in all explanations, almost all important features have the same sign and moderate values. Examples suggests that odor and gill size are deciding factors.

Let's train another model (Logistic Regression) and compare explanations.

As we see explanations are different between models. Both models achive 100% accuracy and are almost 100% sure in all example instances but logistic regression has much larger attributions. Signs of attributions and order of most important features are similar in both models.

As we know from analysys of this task on former classes, there are very few features that correctly classify almost all cases. Our explanations also suggests gill size and odor are most important features.

Attributions for logistic regression have higher values than the ones for random forest. It may mean that logistic regression model have higher local variability of results. Both models give the same answers on real data, so such behavior would mean that there are differences between them on fake data generated by lime.